Definition 19.1 (Expectation) The expectation of a random variable X is its expected value.
\mathbb{E}_{X} [X]
= \sum_{x \in \mathbb{R}} x \mathbb{P}_{X} (x)
= \sum_{\omega \in \Omega} X (\omega) \mathbb{P} (\omega).
By definition, the expectation of a scalar a is itself
\mathbb{E}_{X} [a] = a.
Corollary 19.1 (Linearity of expectation) For any random variables X, Y: \Omega \to \mathbb{R} that are defined on the same probability space (\Omega, \mathcal{F}, \mathbb{P}), we have
\mathbb{E}_{X, Y} [X + Y] = \mathbb{E}_{X} [X] + \mathbb{E}_{Y} [Y],
and for any scalars a, b \in \mathbb{R}
\mathbb{E}_{X} [a X + b] = a \mathbb{E}_{X} [X] + b.
For the first property, we have
\begin{aligned}
\mathbb{E}_{X, Y} [X + Y] & = \sum_{\omega \in \Omega} (X + Y) (\omega) \mathbb{P} (\omega)
\\
& = \sum_{\omega \in \Omega} (X (\omega) + Y (\omega)) \mathbb{P} (\omega)
\\
& = \sum_{\omega \in \Omega} X (\omega) \mathbb{P} (\omega) + \sum_{\omega \in \Omega} Y (\omega) \mathbb{P} (\omega)
\\
& = \mathbb{E} [X] + \mathbb{E} [Y],
\end{aligned}
and for the second property
\begin{aligned}
\mathbb{E} [aX + b]
& = \sum_{\omega \in \Omega} (aX + b) (\omega) \mathbb{P} (\omega)
\\
& = \sum_{\omega \in \Omega} (aX (\omega) + b) \mathbb{P} (\omega)
\\
& = \sum_{\omega \in \Omega} aX (\omega) \mathbb{P} (\omega)
+ \sum_{\omega \in \Omega} b \mathbb{P} (\omega)
\\
& = a \sum_{\omega \in \Omega} X (\omega) \mathbb{P} (\omega)
+ b \sum_{\omega \in \Omega} \mathbb{P} (\omega)
\\
& = a\mathbb{E} [X] + b
& [\sum_{\omega \in \Omega} \mathbb{P} (\omega) = 1].
\end{aligned}
The “law of the unconscious statistician” states that the expectation of a transformed random variable can be found without finding the probabilities of the transformed random variable, simply by applying the probability weights of the original random variable to the transformed values.
Corollary 19.2 (Law of the unconscious statistician (LOTUS)) The expectation of a random variable Y = g (X) that is a function the
\mathbb{E}_{Y} [Y]
= \sum_{y \in \mathbb{R}} y \mathbb{P}_{Y} (y)
= \sum_{x \in \mathbb{R}} g (x) \mathbb{P}_{X} (x)
= \mathbb{E}_{X} [g (X)]
TODO
Since the probability distribution \mathbb{P}_{Y} (y)
\begin{aligned}
\sum_{y \in \Omega_Y} y \mathbb{P}_Y (y)
& = \sum_{y \in \Omega_Y} y \sum_{x \in \Omega_X : g(x)=y} \mathbb{P}_X (x)
\\
& = \sum_{y \in \Omega_Y} \sum_{x \in \Omega_X : g(x)=y} y \mathbb{P}_X (x)
\\
& = \sum_{y \in \Omega_Y} \sum_{x \in \Omega_X : g(x)=y} g(x) \mathbb{P}_X (x)
\\
& = \sum_{x \in \Omega_X} g(x) \mathbb{P}_X (x)
\\
& = \mathbb{E} [g(X)]
\end{aligned}
Corollary 19.3 The expectation of the product of independent random variables is the product of their own expectations
\mathbb{E}_{X, Y} [X Y] = \mathbb{E}_{X} [X] \mathbb{E}_{Y} [Y].
By Definition 19.1 and Corollary 19.2, we have
\begin{aligned}
\mathbb{E}_{X, Y} [X Y]
& = \sum_{x \in \mathbb{R}} \sum_{y \in \mathbb{R}} x y \mathbb{P}_{X, Y} (x, y)
\\
& = \sum_{x \in \mathbb{R}} \sum_{y \in \mathbb{R}} x y \mathbb{P}_{X} (x) \mathbb{P}_{Y} (y)
& [X, Y \text{ independent}]
\\
& = \sum_{x \in \mathbb{R}} x \mathbb{P}_{X} (x) \sum_{y \in \mathbb{R}} y \mathbb{P}_{Y} (y)
\\
& = \mathbb{E}_{X} [X] \mathbb{E}_{Y} [Y].
\end{aligned}
Conditional expectation
Definition 19.2 (Conditional expectation) Let X, Y be jointly distributed random variables. Then the conditional expectation of X given the event that Y = y is
\mathbb{E}_{X \mid Y} [X \mid y] = \sum_{x \in \mathbb{R}} x \mathbb{P}_{X \mid Y} (x \mid y),
which is a function of y.
Using Corollary 19.2, the conditional expectation of a transformed random variable g (X) is
\mathbb{E}_{X \mid Y} [g (X) \mid y] = \sum_{x \in \mathbb{R}} g (x) \mathbb{P}_{X \mid Y} (x \mid y).
Theorem 19.1 (Law of total expectation (LTE)) Let X, Y be jointly distributed random variables. The expectation of g (X) can be calculated from its conditional expectation
\mathbb{E}_{X} [g (X)] = \sum_{y \in \mathbb{R}} \mathbb{E}_{X \mid Y} [g (X) \mid y] \mathbb{P}_{Y} (y).
By expanding \mathbb{E}_{X \mid Y} [g (X) \mid y], we can have
\begin{aligned}
\sum_{y \in \mathbb{R}} \mathbb{E}_{X \mid Y} [g(X) \mid y] \mathbb{P}_{Y} (y)
& = \sum_{y \in \mathbb{R}} \left(
\sum_{x \in \mathbb{R}} g(x) \mathbb{P}_{X \mid Y} (x \mid y)
\right) \mathbb{P}_{Y} (y)
\\
& = \sum_{x \in \mathbb{R}} \sum_{y \in \mathbb{R}} g (x) \mathbb{P}_{X \mid Y} (x \mid y) \mathbb{P}_{Y} (y)
\\
& = \sum_{x \in \mathbb{R}} g (x) \sum_{y \in \mathbb{R}} \mathbb{P}_{X, Y} (x, y)
\\
& = \sum_{x \in \mathbb{R}} g (x) \mathbb{P}_{X} (x)
\\
& = \mathbb{E}_{X} [g (X)].
\end{aligned}
Variance
The concept of the variance summarizes how much a random variable deviates from its mean on average.
Definition 19.3 The variance of a random variable X is defined to be
\mathrm{Var} [X] = \mathbb{E}_{X} [(X - \mathbb{E}_{X} [X])^{2}].
Corollary 19.4 The variance can also be calculated as
\mathrm{Var} [X] = \mathbb{E}_{X} [X^{2}] - \mathbb{E}_{X} [X]^{2}.
Let \mu = \mathbb{E}_{X} [X]. By Definition 19.3, we have
\begin{aligned}
\mathrm{Var} (X)
& = \mathbb{E}_{X} [(X - \mu)^2]
\\
& = \mathbb{E}_{X} [X^2 - 2 \mu X + \mu^2]
\\
& = \mathbb{E}_{X} [X^2] - 2 \mu\mathbb{E} [X] + \mu^2
& [\text{linearity of expectation}]
\\
& = \mathbb{E}_{X} [X^2] - 2 \mathbb{E} [X]^2 + \mathbb{E} [X]^2
\\
& = \mathbb{E}_{X} [X^2] - \mathbb{E} [X]^2
\end{aligned}
First we show that variance is invariant of the shift. By Definition 19.3,
\begin{aligned}
\mathrm{Var} (X + b)
& = \mathbb{E} [((X + b) - \mathbb{E} [X + b])^{2}]
\\
& = \mathbb{E} [(X + b - \mathbb{E} [X] - b)^2]
\\
& = \mathbb{E} [(X - \mathbb{E} [X])]^2
\\
& = \text{Var} (X).
\end{aligned}
Then we show that the variance of the scaling of a random variable is squared. By Corollary 19.4,
\begin{aligned}
\text{Var} (aX)
& = \mathbb{E} [(aX)^2] - (\mathbb{E} [aX])^2
\\
& = \mathbb{E} [a^2 X^2] - (a \mathbb{E} [X])^2
\\
& = a^2 \mathbb{E} [X^2] - a^2 \mathbb{E} [X]^2
\\
& = a^2 (\mathbb{E} [X^2] - \mathbb{E} [X]^2)
\\
& = a^2 \text{Var} (X).
\end{aligned}
Standard deviation
Another measure of a random variable X’s spread is the standard deviation.
Definition 19.4 The standard-deviation of a random variable X is
\sigma_{X} = \sqrt{\mathrm{Var}(X)}.
Covariance
Given two random variables X, Y with a joint distribution \mathbb{P}_{X, Y} (x, y), the covariance describes how they are related with each other. If the covariance is positive, increasing one variable generally leads to an increase in the other random variable, and leads to a decrease if it is negative.
Definition 19.5 (Covariance) Let X, Y be random variables. The covariance between X and Y is
\mathrm{Cov} [X, Y] = \mathbb{E}_{X, Y} [(X - \mathbb{E}_{X} [X]) (Y - \mathbb{E}_{Y} [Y])].
Corollary 19.6 The covariance can also be calculated as
\mathrm{Cov} [X, Y] = \mathbb{E}_{X, Y} [X Y] - \mathbb{E}_{X} [X] \mathbb{E}_{Y} [Y].
Let \mu_{X} = \mathbb{E}_{X} [X] and \mu_{Y} = \mathbb{E}_{Y} [Y]. By Definition 19.5, we have
\begin{aligned}
\mathrm{Cov} [X, Y]
& = \mathbb{E}_{X, Y} [(X - \mu_{X}) (Y - \mu_{Y})]
\\
& = \mathbb{E}_{X, Y} [X Y - X \mu_{Y} - \mu_{X} Y + \mu_{X} \mu_{Y}]
\\
& = \mathbb{E}_{X, Y} [X Y] - \mu_{X} \mu_{Y} - \mu_{X} \mu_{Y} + \mu_{X} \mu_{Y}
\\
& = \mathbb{E}_{X, Y} [X Y] - \mathbb{E}_{X} [X] \mathbb{E}_{Y} [Y].
\end{aligned}
Corollary 19.7 The covariance has the following properties.
Invariant to shifting
\mathrm{Cov} [X + a, Y] = \mathrm{Cov} [X, Y].
Linear transformation
\mathrm{Cov} [a X + b Y, Z] = a \mathrm{Cov} [X, Z] + b \mathrm{Cov} [Y, Z].
Covariance of sum of random variables
\mathrm{Cov} [X + A, Y + B] = \mathrm{Cov} [X, Y] + \mathrm{Cov} [X, B] + \mathrm{Cov} [A, Y] + \mathrm{Cov} [A, B].
All of the 3 properties can be proved from the definition of the variance using Corollary 19.1.
\begin{aligned}
\mathrm{Cov} [X + a, Y]
& = \mathbb{E}_{X, Y} [(X + a - \mathbb{E} [X + a]) (Y - \mathbb{E} [Y])]
\\
& = \mathbb{E}_{X, Y} [(X + a - \mathbb{E} [X] - a) (Y - \mathbb{E} [Y])]
\\
& = \mathbb{E}_{X, Y} [(X - \mathbb{E} [X]) (Y - \mathbb{E} [Y])]
\\
& = \mathrm{Cov} [X, Y].
\end{aligned}
\begin{aligned}
\mathrm{Cov} [aX + bY, Z]
& = \mathbb{E}_{X, Y, Z} [(aX + bY - \mathbb{E}_{X, Y} [aX + bY]) (Z - \mathbb{E}_{Z} [Z])]
\\
& = \mathbb{E}_{X, Y, Z} [a(X - \mathbb{E}_{X} [X]) (Z - \mathbb{E}_{Z} [Z]) + b(Y - \mathbb{E}_{Y} [Y]) (Z - \mathbb{E}_{Z} [Z])]
\\
& = a\mathbb{E}_{X, Z} [(X - \mathbb{E}_{X} [X]) (Z - \mathbb{E}_{Z} [Z])] + b\mathbb{E}_{Y, Z} [(Y - \mathbb{E}_{Y} [Y]) (Z - \mathbb{E}_{Z} [Z])]
\\
& = a\mathrm{Cov} [X, Z] + b\mathrm{Cov} [Y, Z].
\end{aligned}
\begin{aligned}
\mathrm{Cov} [X + A, Y + B] &
= \mathbb{E}_{X, A, Y, B} [(X + A - \mathbb{E}_{X, A} [X + A]) (Y + B - \mathbb{E}_{Y, B} [Y + B])]
\\
& = \mathbb{E}_{X, A, Y, B} [(X - \mathbb{E}_{X} [X] + A - \mathbb{E}_{A} [A]) (Y - \mathbb{E}_{Y} [Y] + B - \mathbb{E}_{B} [B])]
\\
& = \mathbb{E}_{X, Y} [(X - \mathbb{E}_{X} [X]) (Y - \mathbb{E}_{Y} [Y])] + \mathbb{E}_{X, B} [(X - \mathbb{E}_{X} [X]) (B - \mathbb{E}_{B} [B])]
\\
& \quad + \mathbb{E}_{A, Y} [(A - \mathbb{E}_{A} [A]) (Y - \mathbb{E}_{Y} [Y])] + \mathbb{E}_{A, B} [(A - \mathbb{E}_{A} [A]) (B - \mathbb{E}_{B} [B])]
\\
& = \mathrm{Cov} [X, Y] + \mathrm{Cov} [X, B] + \mathrm{Cov} [A, Y] + \mathrm{Cov} [A, B].
\end{aligned}
Corollary 19.8 If X and Y are independent, their covariance is 0
\mathrm{Cov} [X, Y] = 0.
According to Corollary 19.3, we have that
\mathbb{E}_{X, Y} [X Y] = \mathbb{E}_{X} [X] \mathbb{E}_{Y} [Y].
Therefore according to Corollary 19.6, we have
\mathrm{Cov} [X, Y] = \mathbb{E}_{X, Y} [X Y] - \mathbb{E}_{X} [X] \mathbb{E}_{Y} [Y] = 0.
Corollary 19.9 For any random variables X and Y
\mathrm{Var} [X + Y] = \mathrm{Var} [X] + \mathrm{Var} [Y] + 2 \mathrm{Cov} [X, Y].
Since the variance of a random variable is its covariance with itself,
\begin{aligned}
\mathrm{Var} [X + Y]
& = \mathrm{Cov} [X + Y, X + Y]
\\
& = \mathrm{Cov} [X, X] + \mathrm{Cov} [X, Y] + \mathrm{Cov} [Y, X] + \mathrm{Cov} [Y, Y]
\\
& = \mathrm{Var} [X] + \mathrm{Var} [Y] + 2 \mathrm{Cov} [X, Y].
\end{aligned}
where we used the third property in Corollary 19.7 in the second equality.
Correlation
The problem with covariance in describing the relation between two random variables is that its values can be affected by the variance of each individual random variable. THe correlation coefficient calculates the normalized covariance that is invariant with the scaling of the individual random variables.
Definition 19.6 Let X, Y be random variables. The correlation coefficient between X and Y is
\rho (X, Y) = \frac{ \mathrm{Cov} [X, Y] }{ \sqrt{\mathrm{Var} [X]} \sqrt{\mathrm{Var} [Y]} }.